Notebook version of demo_load_show_sheet.py

You'll have run jupyter notebook at the command line or via the Windows Anaconda tool.



In [10]:

    
import pandas as pd

df = pd.read_excel("sheet_1_with_simple_logic.xls")
print(df)









    



   Feature1  Feature2 DecisionF1F2         TVShow  Decision2 Decision3  \
0       0.6       0.6         True      Hollyoaks          1      True   
1       0.4       0.6        False      hollyoaks          1     False   
2       0.3       0.4        False     Hollyoaks           0     False   
3       0.9       0.8         True      hollyoaks          1      True   
4       0.9       0.8         True     holly-oaks          0     False   
5       0.9       0.8         True  best TV shows          0     False   

                    Comment  
0                       NaN  
1                       NaN  
2  trailing-space on TVShow  
3                       NaN  
4        badly-spelt TVShow  
5                       NaN



In [11]:

    
df.head() # this creates a Table view (non-interactive but prettier)
# NOTE! head shows 5 items by default and we have 6 items(!)









    Out[11]:






  
    
      
      Feature1
      Feature2
      DecisionF1F2
      TVShow
      Decision2
      Decision3
      Comment
    
  
  
    
      0
      0.6
      0.6
      True
      Hollyoaks
      1
      True
      NaN
    
    
      1
      0.4
      0.6
      False
      hollyoaks
      1
      False
      NaN
    
    
      2
      0.3
      0.4
      False
      Hollyoaks
      0
      False
      trailing-space on TVShow
    
    
      3
      0.9
      0.8
      True
      hollyoaks
      1
      True
      NaN
    
    
      4
      0.9
      0.8
      True
      holly-oaks
      0
      False
      badly-spelt TVShow



In [19]:

    
df.head(10)









    Out[19]:






  
    
      
      Feature1
      Feature2
      DecisionF1F2
      TVShow
      Decision2
      Decision3
      Comment
      Feature1_Times_2
    
  
  
    
      0
      0.6
      0.6
      True
      Hollyoaks
      1
      True
      NaN
      1.2
    
    
      1
      0.4
      0.6
      False
      hollyoaks
      1
      False
      NaN
      0.8
    
    
      2
      0.3
      0.4
      False
      Hollyoaks
      0
      False
      trailing-space on TVShow
      0.6
    
    
      3
      0.9
      0.8
      True
      hollyoaks
      1
      True
      NaN
      1.8
    
    
      4
      0.9
      0.8
      True
      holly-oaks
      0
      False
      badly-spelt TVShow
      1.8
    
    
      5
      0.9
      0.8
      True
      best TV shows
      0
      False
      NaN
      1.8



In [12]:

    
print("Column names:", df.columns)









    



Column names: Index(['Feature1', 'Feature2', 'DecisionF1F2', 'TVShow', 'Decision2',
       'Decision3', 'Comment'],
      dtype='object')



In [13]:

    
print("Information about each row including data types:")
print("(note - type 'object' is catch-all that includes strings)")
df.info()









    



Information about each row including data types:
(note - type 'object' is catch-all that includes strings)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 7 columns):
Feature1        6 non-null float64
Feature2        6 non-null float64
DecisionF1F2    6 non-null bool
TVShow          6 non-null object
Decision2       6 non-null int64
Decision3       6 non-null bool
Comment         2 non-null object
dtypes: bool(2), float64(2), int64(1), object(2)
memory usage: 300.0+ bytes



In [14]:

    
print("\nWe can extract a column of data as a Series object:")
print(df['Feature1'])









    



We can extract a column of data as a Series object:
0    0.6
1    0.4
2    0.3
3    0.9
4    0.9
5    0.9
Name: Feature1, dtype: float64



In [15]:

    
row = df.ix[0]
print("\nWe can extract a row as a Python dictionary:")
print(row)









    



We can extract a row as a Python dictionary:
Feature1              0.6
Feature2              0.6
DecisionF1F2         True
TVShow          Hollyoaks
Decision2               1
Decision3            True
Comment               NaN
Name: 0, dtype: object



In [16]:

    
print("\nRow items, e.g. Feature1={feature1}".format(feature1=row['Feature1']))









    



Row items, e.g. Feature1=0.6000000000000001



In [17]:

    
def multiply_feature1_by_2(cell):
    return cell * 2

# we'll apply a function cell-by-cell to each cell in a Series (we pull out the Feature1 Series)
df['Feature1'].apply(multiply_feature1_by_2)
# note this doesn't change the DataFrame, it generates a new separate Series
# and here we just print it and then discard it









    Out[17]:





0    1.2
1    0.8
2    0.6
3    1.8
4    1.8
5    1.8
Name: Feature1, dtype: float64



In [20]:

    
# we can assign the result back to the DataFrame as a new column
new_result = df['Feature1'].apply(multiply_feature1_by_2)
df['Feature1_Times_2'] = new_result
df.head(10)









    Out[20]:






  
    
      
      Feature1
      Feature2
      DecisionF1F2
      TVShow
      Decision2
      Decision3
      Comment
      Feature1_Times_2
    
  
  
    
      0
      0.6
      0.6
      True
      Hollyoaks
      1
      True
      NaN
      1.2
    
    
      1
      0.4
      0.6
      False
      hollyoaks
      1
      False
      NaN
      0.8
    
    
      2
      0.3
      0.4
      False
      Hollyoaks
      0
      False
      trailing-space on TVShow
      0.6
    
    
      3
      0.9
      0.8
      True
      hollyoaks
      1
      True
      NaN
      1.8
    
    
      4
      0.9
      0.8
      True
      holly-oaks
      0
      False
      badly-spelt TVShow
      1.8
    
    
      5
      0.9
      0.8
      True
      best TV shows
      0
      False
      NaN
      1.8



In [ ]:

	Feature1	Feature2	DecisionF1F2	TVShow	Decision2	Decision3	Comment
0	0.6	0.6	True	Hollyoaks	1	True	NaN
1	0.4	0.6	False	hollyoaks	1	False	NaN
2	0.3	0.4	False	Hollyoaks	0	False	trailing-space on TVShow
3	0.9	0.8	True	hollyoaks	1	True	NaN
4	0.9	0.8	True	holly-oaks	0	False	badly-spelt TVShow